Network Working Group X. Lee 


Request for Comments: 4713 W. Mao 
Category: Informational CNNIC 
E. Chen 

N. Hsu 

TWNIC 


J. Klensin 
October 2006 


Registration and Administration Recommendations for Chinese Domain Names 
Status of This Memo 


This memo provides information for the Internet community. It does 
not specify an Internet standard of any kind. Distribution of this 
memo is unlimited. 


Copyright Notice 
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IESG Note 


This RFC is not a candidate for any level of Internet Standard. The 
IETF disclaims any knowledge of the fitness of this RFC for any 
purpose and in particular notes that the decision to publish is not 
based on IETF review for such things as security, congestion control, 
or inappropriate interaction with deployed protocols. The RFC Editor 
has chosen to publish this document at its discretion. Readers of 
this document should exercise caution in evaluating its value for 
implementation and deployment. See RFC 3932 for more information. 


Abstract 


Many Chinese characters in common use have variants, which makes most 
of the Chinese Domain Names (CDNs) have at least two different forms. 
The equivalence between Simplified Chinese (SC) and Traditional 
Chinese (TC) characters is very important for CDN registration. This 
memo builds on the basic concepts, general guidelines, and framework 
of RFC 3743 to specify proposed registration and administration 
procedures for Chinese domain names. The document provides the 
information needed for understanding and using the tables defined in 
the IANA table registrations for Simplified and Traditional Chinese. 
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1. Introduction 


With the standardization of Internationalized Domain Names for 
Application (IDNA, described in [RFC3490], [RFC3491], and [RFC3492]), 
internationalized domain names (IDNs), i.e., those that contain non- 
ASCII characters, are included in the DNS, and users can access the 
Internet with their native languages, most of which are not English. 
However, many languages have special requirements, which are not 
addressed in the IDNA RFCs. One way to deal with some of the 
remaining issues involves grouping characters that could be confused 
together as "variants". The variant approach is discussed in RFC 
4290 [RFC4290] and specifically for documents written in Chinese, 
Japanese, or Korean (CJK documents), in the so-called "JET 
Guidelines" RFC 3743 [RFC3743]. Readers of this document are assumed 
to be familiar with the concepts and terminology of the latter. The 
guidelines specified in this document provide a set of specific 
tables and methods required to apply the JET Guidelines to Chinese 
characters. For example, changes were made in the forms of a large 
number of Chinese characters during the last century to simplify 
writing and reading. These "Simplified" characters have been adopted 
in some Chinese-speaking communities, while others continue to use 
the "Traditional" forms. On the global Internet, if IDNA were used 
alone, there would be considerable potential for confusion if the two 
forms were not considered together. Consequently, effective use of 
Chinese Domain Names (CDNs) requires variant equivalence, as 
described in RFC 3743, to handle character differences between 
Simplified and Traditional Chinese forms. 
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Chinese variant equivalence itself is very complicated in principle 
(please read [C2C] for further information). When it comes to the 
usage of Chinese domain names, the basic requirement is to match the 
user perception of Chinese characters between Simplified Chinese (SC) 
and Traditional Chinese (TC) forms. When users register SC or TC 
domain names, they will wish to obtain the other forms (Traditional 
or Simplified, respectively) as well, and expect others to be able to 
access the website or other resources in both forms. 


This document specifies a solution for Chinese domain name 
registration and administration that has been adopted and deployed by 
CNNIC (the top-level domain registry for "CN") and TWNIC (the top- 
level domain registry for "TW") to manage Simplified Chinese and 
Traditional Chinese domain name equivalence. In the terminology of 
RFC 3743, this solution is based on Internationalized Domain Labels 
(IDLs). 


Terminology 


This document adopts the terminologies that are defined in RFC 3743. 
It is not possible to understand this document without first 
understanding the concepts and terminology or RFC 3743, including 
terminology introduced in its examples. Additional terminology is 
defined later in this document. 


Chinese Characters 


This document suggests permitting only a subset of Chinese characters 
in Chinese Domain Names (CDNs) and hence in the DNS. When this 
document discusses Chinese characters, it only refers to the subset 
of the characters in the first column of the current IANA 
registration tables for Chinese as discussed in Section 2.3 and 
Section 2.4. These are defined, in detail, in [LVT-SC] and [LVT-TC]. 
Of course, characters excluded from these tables are still valid 
Chinese characters. However, this document strongly suggests that 
registries do not permit any registration of Chinese characters that 
are not listed in the tables. The tables themselves will be updated 
in the future if necessary. 


Chinese Domain Name Label (CDNL) 


If an IDN label includes at least one Chinese character, it is called 
a Chinese Domain Name (CDN) Label. CDN labels may contain characters 
from the traditional letter-digit-hyphen (LDH) set as well as Chinese 
characters. 
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2.3. Simplified Chinese Variant Table (SCVT) 


Based on RFC 3743 [RFC3743], a language table for Simplified Chinese 
has been defined [LVT-SC]. It can be used for the registration of 
Simplified Chinese domain names. The key feature of this table is 
that the preferred variant is the SC character, which is used by 
Chinese mainland users or defined in Chinese-related standards. 


2.4. Traditional Chinese Variant Table (TCVT) 


Similarly, a language table has been defined for Traditional Chinese 

[LVT-TC]. It is also based on the rules of RFC 3743. It can be used 
for registration of Traditional Chinese domain names. The preferred 

variant is the TC character, which is used by Taiwan users or defined 
in related standards. 


2.5. Original Chinese Domain Name Label (OCDNL) 

The Chinese Domain Name Label that users submit for registration. 
3. Procedure for Registration of Chinese Domain Name Labels 
3.1. Terminology and Context 


This document adopts the same procedure for Chinese Domain Name Label 
(CDNL) registration as the one defined for more general IDN labels in 
section 3.2.3 of RFC 3743 [RFC3743]. The terminology and notation 
used below, and the steps that are mentioned, derive from that 
document. In particular, "CV" is the character variant associated 
with an input character ("IN") and a language table. The language 
tables used here are those for Chinese as spoken and written in the 
Chinese mainland (ZH-CN) and on Taiwan (ZH-TW). "PV" is the selected 
Preferred Variant. 


3.2. Procedure in Terms of the RFC 3743 Model 


The first column of the Simplified Chinese Variant Table (SCVT) is 
the same as the first column of the corresponding Traditional Chinese 
Variant Table (TCVT) and so are the third columns of both tables. 
Consequently, the CV(IN, ZH-CN) will be same as the CV(IN, ZH-TW) 
after Step 3; the PV(IN, ZH-CN) is in SC form, and the PV(IN, ZH-TW) 
is in TC form. As a result, there will not be more than three 
records (i.e., for the original label (OCDNL), the Simplified Chinese 
(SC) form, and the Traditional Chinese (TC) form) to be added into 
the zone file after applying this procedure. In other words, the 
procedure does not generate labels that contain a mixture of 
Simplified and Traditional Chinese as variants. 
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The set of languages associated with the input (IN) is both ZH-CN and 
ZH-TW by default. The procedure for CDNL registration uses the 
optional registry-defined rules provided in RFC 3743 for optional 
processing, with the understanding that the rules may vary for 
different registries supporting CDNs. The motivation for such rules 
is described below. 


The preferred variant(s) is/are TC in TCVT, and SC in SCVT. There 
may be more than one preferred variant for a given valid character. 


3.3. RFC 3743 Optional Registry Processing 


In actuality, while IDNA, and hence RFC 3743, process characters one 
at a time, the actual relationship between the valid code point and 
the preferred variant is contextual: whether one character can be 
substituted for another depends on the characters with which it is 
associated in a label or, more generally, in a phrase. In 
particular, some of the preferred variants make no sense in 
combination with other characters; therefore, those combinations 
should not be added into the Zone file (described as "ZV" or zone 


variants in RFC 3743). If desired, it should be possible to define 
and implement rules to reduce the preferred variant labels to only 
plausible ones. This could be done, for example, with some 


artificial intelligence tools, or with feedback from the registrant, 
or with selection based on frequency of occurrence in other texts. 

To illustrate one possibility, the OCDNL could be required to be TC- 
only or SC-only, and if there is more than one preferred variant, the 
OCDNL will be used as the PV, instead of the PV produced by the 
algorithm. 


To reemphasize, the tables in [LVT-SC] and [LVT-TC] follow the table 
format and terminologies defined in [RFC3743]. If one intends to 
implement Chinese domain name registrations based on these two tables 
or ones similar to them, a complete understanding of RFC 3743 is 
needed for the proper use of those tables. 


4. Security Considerations 


This document is subject to the same security considerations as RFC 
3743, which defines the table formats and operations. As with that 
base document, part of its intent is to reduce the security problems 
that might be caused by confusion among characters with similar 
appearances or meanings. While it will not introduce any additional 
security issues, additional registration restrictions such as those 
outlined in Section 3 may further reduce potential problems. 
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